The plotting library I choose is Altair. The reason is that:

  • The interactive plot created by ipywidgets cannot be displayed in the fastpage, as is shown in the forum: link here and it is also not easy to put plotly figures in the fastpage blog.

  • Altair also offers a variety of interactive options with sliders and dropdowns, which can make the plot more vivid etc.

import pandas as pd
import altair as alt
from vega_datasets import data

Task 1

The first dataset is about the malaria deaths by country for all ages across the world and time. The entity is the full country name, the code column is the ISO3166 code. The head of the data is shown below. Considering that the data contains countries and time, the first plot we can consider is the map plot with a time slider.

df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_deaths.csv')
df.head()
Entity Code Year Deaths - Malaria - Sex: Both - Age: Age-standardized (Rate) (per 100,000 people)
0 Afghanistan AFG 1990 6.802930
1 Afghanistan AFG 1991 6.973494
2 Afghanistan AFG 1992 6.989882
3 Afghanistan AFG 1993 7.088983
4 Afghanistan AFG 1994 7.392472

In order to draw a map plot in altair, we need to add the country code to the original dataset so that we can map the countries into the world map. The country_info dataset contains full infomartion about each country, including the FIFA code, ISO3166 code etc.

country_info = pd.read_csv("https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv",dtype = 'str')
country_info.head()
FIFA Dial ISO3166-1-Alpha-3 MARC is_independent ISO3166-1-numeric GAUL FIPS WMO ISO3166-1-Alpha-2 ... Sub-region Name official_name_ru Global Name Capital Continent TLD Languages Geoname ID CLDR display name EDGAR
0 TPE 886 TWN ch Yes 158 925 TW TW ... NaN NaN NaN Taipei AS .tw zh-TW,zh,nan,hak 1668284 Taiwan NaN
1 AFG 93 AFG af Yes 004 1 AF AF AF ... Southern Asia Афганистан World Kabul AS .af fa-AF,ps,uz-AF,tk 1149361 Afghanistan B2
2 ALB 355 ALB aa Yes 008 3 AL AB AL ... Southern Europe Албания World Tirana EU .al sq,el 783754 Albania B3
3 ALG 213 DZA ae Yes 012 4 AG AL DZ ... Northern Africa Алжир World Algiers AF .dz ar-DZ 2589581 Algeria B4
4 ASA 1-684 ASM as Territory of US 016 5 AQ AS ... Polynesia Американское Самоа World Pago Pago OC .as en-AS,sm,to 5880801 American Samoa B5

5 rows × 56 columns

We can do some data transformation to the original malaria death dataset and filter out the country that are in the country_info dataset

df.columns = ['name', 'ISO3166-1-Alpha-3','Year','Death_Rate'] # Change the column name to make the name of two datasets more consistent. 
df_new = df[df['ISO3166-1-Alpha-3'].isin(country_info['ISO3166-1-Alpha-3'])] # check whether the country exists in the country_info dataset
df_new.head() 
name ISO3166-1-Alpha-3 Year Death_Rate
0 Afghanistan AFG 1990 6.802930
1 Afghanistan AFG 1991 6.973494
2 Afghanistan AFG 1992 6.989882
3 Afghanistan AFG 1993 7.088983
4 Afghanistan AFG 1994 7.392472

Merge the two dataset and exclude irrelevant columns

df_final = pd.merge(df_new, country_info , on = 'ISO3166-1-Alpha-3', how = 'left')
df_final = df_final[['name','Year','Death_Rate','ISO3166-1-numeric']]
df_final.head()
name Year Death_Rate ISO3166-1-numeric
0 Afghanistan 1990 6.802930 004
1 Afghanistan 1991 6.973494 004
2 Afghanistan 1992 6.989882 004
3 Afghanistan 1993 7.088983 004
4 Afghanistan 1994 7.392472 004
df_final.head()
name Year Death_Rate ISO3166-1-numeric
0 Afghanistan 1990 6.802930 004
1 Afghanistan 1991 6.973494 004
2 Afghanistan 1992 6.989882 004
3 Afghanistan 1993 7.088983 004
4 Afghanistan 1994 7.392472 004

Set the slider

alt.data_transformers.disable_max_rows() #The default row that altair can take is 5000, we need to specify the disable_max_rows if the rows are over 5000
countries = alt.topo_feature(data.world_110m.url, 'countries')
# Set the silder, step = 1, min year is 1990, max year is 2016
slider = alt.binding_range(
    step=1,
    min=1990, 
    max=2016
)

select_date = alt.selection_single(
    name="Slider", 
    fields=['Year'],
    bind=slider, 
)
alt.Chart(df_final).mark_geoshape()\
    .encode(color='Death_Rate:Q')\
    .add_selection(select_date)\
    .transform_filter(select_date)\
    .transform_lookup(
        lookup='ISO3166-1-numeric',
        from_=alt.LookupData(countries, key='id', fields=["type", "properties", "geometry"])
    )\
    .project('equirectangular')\
    .properties(
        width=400,
        height=300,
        title='Malaria Death Rate (per 100,000 people) '
    )

From the above map plot, we can see that most of the countries have death rate less than 50 per 100,000 people, countries with high malaria death rate are more likely to be in Afirica and as time goes by, the death rate continues to decrease. However, one problem with this plot may be that we cannot tell the trend of a single country, it is difficult to distinguish each country. Another plot we can try is the line chart with country as dropdown list.

df_final['Year'] = pd.to_datetime(df_final['Year'], format='%Y') #Change the year to datetime type
country_list = df_final['name'].dropna().unique()
country_list = country_list.tolist()
dropdown = alt.binding_select(
    options = country_list
)
select_country = alt.selection_single(
    name="dropdown", 
    fields=['name'],
    bind = dropdown, 
)
alt.Chart(df_final).mark_line().encode(
    x='Year',
    y='Death_Rate',
).add_selection(
   select_country
).transform_filter(
    select_country
).properties(
    width = 500,
    height=400,
    title=f'Malaria Death Rate for a singe country (per 100,000 people)'
).configure_axis(
    grid=False
)

From the line plot, we can see how the death rate of each country change over time.

Task 2

The second dataset is about the malaria incidence by country for all ages across the world and time. The entity is the full country name, the code column is the ISO3166 code. The head of the data is shown below.

df1 = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_inc.csv')
df1.head()
Entity Code Year Incidence of malaria (per 1,000 population at risk) (per 1,000 population at risk)
0 Afghanistan AFG 2000 107.100000
1 Afghanistan AFG 2005 46.500000
2 Afghanistan AFG 2010 23.900000
3 Afghanistan AFG 2015 23.600000
4 Algeria DZA 2000 0.037746

We can do some data transformation to the original malaria incidence dataset and filter out the country that are in the country_info dataset

df1.columns = ['name', 'ISO3166-1-Alpha-3','Year','Incidence_Rate']
df1_new = df1[df1['ISO3166-1-Alpha-3'].isin(country_info['ISO3166-1-Alpha-3'])]
df1_new.head()
name ISO3166-1-Alpha-3 Year Incidence_Rate
0 Afghanistan AFG 2000 107.100000
1 Afghanistan AFG 2005 46.500000
2 Afghanistan AFG 2010 23.900000
3 Afghanistan AFG 2015 23.600000
4 Algeria DZA 2000 0.037746
df1_final = pd.merge(df1_new, country_info , on = 'ISO3166-1-Alpha-3', how = 'left')
df1_final = df1_final[['name','Year','Incidence_Rate','ISO3166-1-numeric']]
df1_final.head()
name Year Incidence_Rate ISO3166-1-numeric
0 Afghanistan 2000 107.100000 004
1 Afghanistan 2005 46.500000 004
2 Afghanistan 2010 23.900000 004
3 Afghanistan 2015 23.600000 004
4 Algeria 2000 0.037746 012
alt.data_transformers.disable_max_rows()
countries = alt.topo_feature(data.world_110m.url, 'countries')
slider1 = alt.binding_range(
    step=5,
    min=2000, 
    max=2015
)

select_date1 = alt.selection_single(
    name="slider", 
    fields=['Year'],
    bind=slider1, 
)
alt.Chart(df1_final).mark_geoshape()\
    .encode(color='Incidence_Rate:Q')\
    .add_selection(select_date1)\
    .transform_filter(select_date1)\
    .transform_lookup(
        lookup='ISO3166-1-numeric',
        from_=alt.LookupData(countries, key='id', fields=["type", "properties", "geometry"])
    )\
    .project('equirectangular')\
    .properties(
        width=400,
        height=300,
        title='Malaria Incidence Rate (per 1,000 people) '
    )
df1_final['Year'] = pd.to_datetime(df1_final['Year'], format='%Y') #Change the year to datetime type
country_list1 = df1_final['name'].dropna().unique()
country_list1 = country_list1.tolist()
dropdown1 = alt.binding_select(
    options = country_list1
)
select_country1 = alt.selection_single(
    name="dropdown", 
    fields=['name'],
    bind = dropdown1, 
)
alt.Chart(df1_final).mark_line().encode(
    x='Year',
    y='Incidence_Rate',
).add_selection(
   select_country1
).transform_filter(
    select_country1
).properties(
    width = 500,
    height=400,
    title=f'Malaria Incidence Rate for a singe country (per 1,000 people)'
).configure_axis(
    grid=False
)

Task 3

The third dataset is about the malaria deathe by country for all ages across the world and time. The entity is the full country name, the code column is the ISO3166 code. The head of the data is shown below.Malaria deaths by age across the world and time

df2 = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-13/malaria_deaths_age.csv')
df2.head()
Unnamed: 0 entity code year age_group deaths
0 1 Afghanistan AFG 1990 Under 5 184.606435
1 2 Afghanistan AFG 1991 Under 5 191.658193
2 3 Afghanistan AFG 1992 Under 5 197.140197
3 4 Afghanistan AFG 1993 Under 5 207.357753
4 5 Afghanistan AFG 1994 Under 5 226.209363
df2.columns = ['index','name', 'ISO3166-1-Alpha-3','Year','Age_Group','Death_Rate']
df2_new = df2[df2['ISO3166-1-Alpha-3'].isin(country_info['ISO3166-1-Alpha-3'])]
df2_new.head()
index name ISO3166-1-Alpha-3 Year Age_Group Death_Rate
0 1 Afghanistan AFG 1990 Under 5 184.606435
1 2 Afghanistan AFG 1991 Under 5 191.658193
2 3 Afghanistan AFG 1992 Under 5 197.140197
3 4 Afghanistan AFG 1993 Under 5 207.357753
4 5 Afghanistan AFG 1994 Under 5 226.209363
df2_final = pd.merge(df2_new, country_info , on = 'ISO3166-1-Alpha-3', how = 'left')
df2_final = df2_final[['name','Year','Death_Rate','ISO3166-1-numeric','Age_Group']]
df2_final.head()
name Year Death_Rate ISO3166-1-numeric Age_Group
0 Afghanistan 1990 184.606435 004 Under 5
1 Afghanistan 1991 191.658193 004 Under 5
2 Afghanistan 1992 197.140197 004 Under 5
3 Afghanistan 1993 207.357753 004 Under 5
4 Afghanistan 1994 226.209363 004 Under 5

The plot I choose is the line plot with dropdown, different color represent different age group. So based on this plot, we can see the trend of death rate of different countries for different age group.

df2_final['Year'] = pd.to_datetime(df2_final['Year'], format='%Y')
country_list2 = df2_final['name'].dropna().unique()
country_list2 = country_list2.tolist()
dropdown2 = alt.binding_select(
    options = country_list2
)
select_country2 = alt.selection_single(
    name="dropdown", 
    fields=['name'],
    bind = dropdown2, 
)
alt.Chart(df2_final).mark_point().encode(
    x='Year',
    y='Death_Rate',
    color = 'Age_Group'
).add_selection(
   select_country2
).transform_filter(select_country2)